Goto

Collaborating Authors

 spoken language


Unsupervised Learning of Spoken Language with Visual Context

Neural Information Processing Systems

Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms.


Low-Resource NMT: A Case Study on the Written and Spoken Languages in Hong Kong

Mak, Hei Yi, Lee, Tan

arXiv.org Artificial Intelligence

The majority of inhabitants in Hong Kong are able to read and write in standard Chinese butuse Cantonese as theprimary spoken language in daily life. Spoken Cantonese can be transcribed into Chinese characters, which constitute the so-called writte n Cantonese. Written Cantonese exhibits significant lexical and grammatical differences from standard written Chinese. The riseof written Cantonese is increasingly evident in thecyber world.The growing interaction between Mandarin speakers and Cantonese sp eak-ers is leading to a clear demand for automatic translation between Chinese and Cantonese. This paper describes a transformer-based neural machine translation (NMT) system for written-Chine se-to-written-Cantonese translation. Given that parallel text data of Chinese and Cantonese are extremely scarce, a major focus of thi s study is on the effort of preparing good amount of training dat a for NMT. In addition to collecting 28K parallel sentences from previous linguistic studies and scattered internet resources, we devise an effective approach to obtaining 72K parallel sentences by automatically extracting pairs of semantically similar senten ces from parallel articles on Chinese Wikipedia and Cantonese Wikip edia. We show that leveraging highly similar sentence pairs minedfrom Wikipedia improves translation performance in all test set s. Our system outperforms Baidu Fanyi's Chinese-to-Cantonese tr ansla-tion on 6 out of 8 test sets in BLEU scores. Translation exampl es reveal that our system is able to capture important linguistic transformations between standard Chinese and spoken Cantonese.


Reviews: Unsupervised Learning of Spoken Language with Visual Context

Neural Information Processing Systems

This is interesting work that is pointing into the right direction, but a few aspects of this paper are a bit problematic: 1) It would have been useful (or interesting) to use a corpus that has existing text captions, and either have users re-speak the text captions, or collect additional captions. The data collections seems generally well thought-out, but why was the Places205 data set used? Prompted speech (such as collected here) is not "spontaneous", otherwise the WSJ recognizer would not have given 20 % WER (this aspect is irrelevant for the purpose of this paper, though, I think). Typically, multiple captions are being generated for a single image. Has this been done here as well? Or is there only a single caption for each image?


The World's Top 10 Most Spoken Languages

#artificialintelligence

The Amazon growth story has been a remarkable one so far. On the top line, the company has grown every single year since its inception. Even in going back to 2004, Amazon generated a much more modest $6.9 billion in revenue compared to the massive $469 billion for 2021. Most of these sales come from their retail and ecommerce operations, which the company has come to be known for. That's because 74% of Amazon's operating profit comes from Amazon Web Services (AWS).


Ranked: The 100 Most Spoken Languages Around the World

#artificialintelligence

Even though you're reading this article in English, there's a good chance it might not be your mother tongue. Of the billion-strong English speakers in the world, only 33% consider it their native language. The popularity of a language depends greatly on utility and geographic location. Additionally, how we measure the spread of world languages can vary greatly depending on whether you look at total speakers or native speakers. Today's detailed visualization from WordTips illustrates the 100 most spoken languages in the world, the number of native speakers for each language, and the origin tree that each language has branched out from.


Unsupervised Learning of Spoken Language with Visual Context

Harwath, David, Torralba, Antonio, Glass, James

Neural Information Processing Systems

Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms. Papers published at the Neural Information Processing Systems Conference.


Which Is The World's Most Spoken Language? Terpene. What's That, You Ask?

International Business Times

China, the most populous country in the world, has close to one billion people that speak Mandarin. Spanish is spoken by a less than half that number, primarily in Mexico, Spain and the countries in South America. English follows close behind, with Hindi in India and Arabic in the Middle East making up the top five. Or so you would think. The most common language in the world is actually not human at all.